pacman::p_load(sf, funModeling,maptools,raster, spatstat, tmap , tidyverse, sfdep)Take-Home Exercise 1: Application of Spatial Point Patterns Analysis to discover geographical distribution of functional & non-functional water points in Osub State, Nigeria
1. Overview
Water is an important resource to mankind. Clean and accessible water is critical to human health. It provides a healthy environment, a sustainable economy, reduces poverty and ensures peace and security. Yet over 40% of the global population does not have access to sufficient clean water. By 2025, 1.8 billion people will be living in countries or regions with absolute water scarcity, according to UN-Water. The lack of water poses a major threat to several sectors, including food security. Agriculture uses about 70% of the world’s accessible freshwater.

Developing countries are most affected by water shortages and poor water quality. Up to 80% of illnesses in the developing world are linked to inadequate water and sanitation. Despite technological advancement, providing clean water to the rural community is still a major development issues in many countries globally, especially countries in the Africa continent.
To address the issue of providing clean and sustainable water supply to the rural community, a global Water Point Data Exchange (WPdx) project has been initiated. The main aim of this initiative is to collect water point related data from rural areas at the water point or small water scheme level and share the data via WPdx Data Repository, a cloud-based data library. What is so special of this project is that data are collected based on WPDx Data Standard.
1.0 Objectives
Geospatial analytics hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate spatial point patterns analysis methods to discover the geographical distribution of functional and non-function water points and their co-locations if any in Osun State, Nigeria.
1.1 Data Acquisition
Apstial data
For the purpose of this assignment, data from WPdx Global Data Repositories will be used. There are two versions of the data. They are: WPdx-Basic and WPdx+. You are required to use WPdx+ data set.
Geospatial data
This study will focus of Osun State, Nigeria. The state boundary GIS data of Nigeria can be downloaded either from The Humanitarian Data Exchange portal or geoBoundaries.
2. Getting started
2.1 Installing and Loading the R packages
For take-home assignment 1, we will need to install the following packages:
The code chunk is to check that all the required packages are installed if not, install them.
if (!require(sf)) {
install.packages("sf")
}
if (!require(funModeling)) {
install.packages("funModeling")
}
if (!require(maptools)) {
install.packages("maptools")
}
if (!require(raster)) {
install.packages("raster")
}
if (!require(spatstat)) {
install.packages("spatstat")
}
if (!require(tmap)) {
install.packages("tmap")
}
if (!require(tidyverse)) {
install.packages("tidyverse")
}3. Data Wrangling: Geospatial Data & Aspatial Data
3.1 Importing geoBoundaries Data into R
In this section of 3.1, st_read() of sf package will be used to import geospatial geoboundaries-NGA data set into R.
geoNGA <- st_read("data/geospatial/",
layer = "geoBoundaries-NGA-ADM2")Reading layer `geoBoundaries-NGA-ADM2' from data source
`C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
From the output message, we learn that:
Geometry type of geoBoundaries dataset is multipolygon
774 features and 5 fields
Assigned CRS is WGS 84 (geographic coordinate system)
Dimension is XY
3.2 Importing Geospatial NGA Data into R
In this section of 3.2, st_read() of sf package will be used to import geospatial NGA dataset into R.
We filter data to only Osun state as that is what we are interested in finding for this assignment!
NGA <- st_read("data/geospatial/",
layer = "nga_admbnda_adm2_osgof_20190417") %>%
filter(ADM1_EN == "Osun") %>%
st_transform(crs = 26392)Reading layer `nga_admbnda_adm2_osgof_20190417' from data source
`C:\G2wongkelly\IS415_GAA\Take_Home_Ex\Take_Home_Ex01\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
From the output message, we learn that:
Geometry type of NGA dataset is multipolygon
774 features and 16 fields
Assigned CRS is WGS 84 (geographic coordinate system)
Dimension is XY
In geospatial analytics, we need to transform the original data that is in geographic coordinate system (WGS) to projected coordinate system. This is because geographic coordinate system is not appropriate if the analysis need to use distance and/or area measurements.
Therefore, we need to transform NGA dataset to projected coordinate system by using st_transform() in sf package. (will be further elaborate in section 3.3.1 and 3.3.2)
By examining both sf dataframe closely, we notice that NGA provide both LGA and state information.
Hence, NGA data.frame will be used for the subsequent processing.
3.3 Importing Aspatial Data into R
In this section of 3.3, read_csv() will be used to import asptial data set into R and we filter out only Nigeria, Osun data rows as those are what we interested in analysing for this project.
wp_nga <- read_csv("data/aspatial/WPdx.csv") %>%
filter(`#clean_adm1` == "Osun") %>%
filter(`#clean_country_name` == "Nigeria")3.3.1 Converting Water Point Data into SF Point Features
Step 1: Convert the wkt field into sfc field by using st_as_sfc() data type.
wp_nga$Geometry = st_as_sfc(wp_nga$`New Georeferenced Column`)
wp_nga# A tibble: 5,745 × 75
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 225950 Federal Minis… 7.43 4.26 05/05/… Yes Boreho… Well Hand P…
2 225524 Federal Minis… 7.78 4.56 04/22/… Yes Protec… Well Hand P…
3 197014 Federal Minis… 7.49 4.53 04/30/… Yes Boreho… Well Mechan…
4 225173 Federal Minis… 7.93 4.73 05/02/… Yes Boreho… Well Hand P…
5 225843 Federal Minis… 7.74 4.44 05/08/… Yes Boreho… Well Hand P…
6 235508 Federal Minis… 7.15 4.64 04/27/… Yes Protec… Well Hand P…
7 197708 Federal Minis… 7.87 4.72 05/13/… Yes Boreho… Well Mechan…
8 195041 Federal Minis… 7.73 4.45 06/17/… Yes Protec… Spring <NA>
9 225222 Federal Minis… 7.81 4.15 05/14/… Yes Protec… Spring Mechan…
10 460770 GRID3 7.4 4.33 06/13/… Unknown Boreho… Well <NA>
# … with 5,735 more rows, 66 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
Step 2: Convert the tibble data.frame into an sf object by using st_sf(). It is also important for us to include the referencing system of the data into the sf object.
wp_sf <- st_sf(wp_nga, crs=4326)
wp_sfSimple feature collection with 5745 features and 74 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4.032004 ymin: 7.060309 xmax: 5.06 ymax: 8.061898
Geodetic CRS: WGS 84
# A tibble: 5,745 × 75
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
* <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 225950 Federal Minis… 7.43 4.26 05/05/… Yes Boreho… Well Hand P…
2 225524 Federal Minis… 7.78 4.56 04/22/… Yes Protec… Well Hand P…
3 197014 Federal Minis… 7.49 4.53 04/30/… Yes Boreho… Well Mechan…
4 225173 Federal Minis… 7.93 4.73 05/02/… Yes Boreho… Well Hand P…
5 225843 Federal Minis… 7.74 4.44 05/08/… Yes Boreho… Well Hand P…
6 235508 Federal Minis… 7.15 4.64 04/27/… Yes Protec… Well Hand P…
7 197708 Federal Minis… 7.87 4.72 05/13/… Yes Boreho… Well Mechan…
8 195041 Federal Minis… 7.73 4.45 06/17/… Yes Protec… Spring <NA>
9 225222 Federal Minis… 7.81 4.15 05/14/… Yes Protec… Spring Mechan…
10 460770 GRID3 7.4 4.33 06/13/… Unknown Boreho… Well <NA>
# … with 5,735 more rows, 66 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
3.3.2 Transforming into Nigeria Projected Coordinate System
wp_sf <- wp_sf %>%
st_transform(crs = 26392)
wp_sfSimple feature collection with 5745 features and 74 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 177285.9 ymin: 340054.1 xmax: 291287.1 ymax: 450859.7
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 5,745 × 75
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
* <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 225950 Federal Minis… 7.43 4.26 05/05/… Yes Boreho… Well Hand P…
2 225524 Federal Minis… 7.78 4.56 04/22/… Yes Protec… Well Hand P…
3 197014 Federal Minis… 7.49 4.53 04/30/… Yes Boreho… Well Mechan…
4 225173 Federal Minis… 7.93 4.73 05/02/… Yes Boreho… Well Hand P…
5 225843 Federal Minis… 7.74 4.44 05/08/… Yes Boreho… Well Hand P…
6 235508 Federal Minis… 7.15 4.64 04/27/… Yes Protec… Well Hand P…
7 197708 Federal Minis… 7.87 4.72 05/13/… Yes Boreho… Well Mechan…
8 195041 Federal Minis… 7.73 4.45 06/17/… Yes Protec… Spring <NA>
9 225222 Federal Minis… 7.81 4.15 05/14/… Yes Protec… Spring Mechan…
10 460770 GRID3 7.4 4.33 06/13/… Unknown Boreho… Well <NA>
# … with 5,735 more rows, 66 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <lgl>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
From the output message, we learn that:
Geometry type of NGA dataset is now point
5745 features and 74 fields
Projected CRS: Minna/Nigeria Mid Belt
Dimension: XY
We have successfully transformed the data!! :D
4. Data Pre-Processing
Before we can visualise our dataset and do the necessary analysis, we have to do data cleaning which is an important step in any data science task including geospatial data science. Things to check in the dataset:
Invalid geometries
Exclude redundancy
Missing value
Duplicate name
4.1 Check for Invalid Geometries
length(which(st_is_valid(NGA) == FALSE))[1] 0
length(which(st_is_valid(wp_sf) == FALSE))[1] 0
From the above generated output message, there are no invalid geometries! Great!
4.2 Exclude Redundancy
NGA <- NGA %>%
select(c(3:4, 8:9))4.3 Check for Missing Value
NGA[rowSums(is.na(NGA))!=0,]Simple feature collection with 0 features and 4 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: Minna / Nigeria Mid Belt
[1] ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE geometry
<0 rows> (or 0-length row.names)
The printout shows that there is zero missing value in the dataset!
4.4 Check for Duplicate Name
NGA$ADM2_EN[duplicated(NGA$ADM2_EN)==TRUE]character(0)
Great!
Now, we are ready to analyse the dataset!
5. Data Wrangling for Water Point Data
Exploratory Data Analysis (EDA) is a popular approach to gain initial understanding of the data.
Firstly, we take a look at all the column names in wp_sf dataset and identify the column we need to plot status_clean frequency bar chart.
colnames(wp_sf) [1] "row_id" "#source"
[3] "#lat_deg" "#lon_deg"
[5] "#report_date" "#status_id"
[7] "#water_source_clean" "#water_source_category"
[9] "#water_tech_clean" "#water_tech_category"
[11] "#facility_type" "#clean_country_name"
[13] "#clean_adm1" "#clean_adm2"
[15] "#clean_adm3" "#clean_adm4"
[17] "#install_year" "#installer"
[19] "#rehab_year" "#rehabilitator"
[21] "#management_clean" "#status_clean"
[23] "#pay" "#fecal_coliform_presence"
[25] "#fecal_coliform_value" "#subjective_quality"
[27] "#activity_id" "#scheme_id"
[29] "#wpdx_id" "#notes"
[31] "#orig_lnk" "#photo_lnk"
[33] "#country_id" "#data_lnk"
[35] "#distance_to_primary_road" "#distance_to_secondary_road"
[37] "#distance_to_tertiary_road" "#distance_to_city"
[39] "#distance_to_town" "water_point_history"
[41] "rehab_priority" "water_point_population"
[43] "local_population_1km" "crucialness_score"
[45] "pressure_score" "usage_capacity"
[47] "is_urban" "days_since_report"
[49] "staleness_score" "latest_record"
[51] "location_id" "cluster_size"
[53] "#clean_country_id" "#country_name"
[55] "#water_source" "#water_tech"
[57] "#status" "#adm2"
[59] "#adm3" "#management"
[61] "#adm1" "New Georeferenced Column"
[63] "lat_deg_original" "lat_lon_deg"
[65] "lon_deg_original" "public_data_source"
[67] "converted" "count"
[69] "created_timestamp" "updated_timestamp"
[71] "#pay_clean" "#subjective_quality_clean"
[73] "is_duplicate" "dataset_title"
[75] "Geometry"
Now, once we have the column name “#status_clean”, we use the “table” function to get the frequency of unique values in a vector.
This will return a frequency table with unique values in “wp_sf$‘#status_clean’ and their corresponding frequency. The”sort” function will sort the table based on the frequency.
sort(table(wp_sf$"#status_clean"), decreasing = TRUE)
Functional Non-Functional Functional, needs repair
2406 2086 259
Non-Functional, dry Functional, not in use Abandoned/Decommissioned
159 64 15
Functional but not in use
8
To plot a bar chart based on the frequency table, use the “barplot” function in R.
The below code will create a bar plot of the frequency table, with the x-axis labeled “status_clean” and the y-axis labeled “Frequency”. The main title of the plot will be “Bar Plot of status_clean”.
#Set the colour scheme for the bar
colors <- c("grey","pink","purple","blue","green","yellow")
#Plot the frequency of status_clean in bar chart
freq_table <- sort(table(wp_sf$"#status_clean"), decreasing = TRUE)
barplot(freq_table, xlab = "status_clean", ylab = "Frequency", main = "Bar Plot of status_clean", col = colors)
The below code chunk will include percentage labels on the bar plot!
# calculate the percentage of each status_clean value
freq_table_pct <- round(100* prop.table(freq_table),2)
# plot the bar plot with the percentage labels
barplot(freq_table, xlab = "status_clean", ylab = "Frequency", main = "Bar Plot of status_clean (Percentage)", col = colors, las = 2)
text(x = 1:length(freq_table), y = freq_table + 0.5, labels = paste(freq_table_pct, "%"), pos = 3, cex = 0.7)
Next, code chunk below will be used to perform the following data wrangling tasksP - rename() of dplyr package is used to rename the column from #status_clean to status_clean for easier handling in subsequent steps. mutate() and replace_na() are used to recode all the NA values in status_clean into unknown.
wp_sf_nga <- wp_sf %>%
rename(status_clean = '#status_clean') %>%
select(status_clean) %>%
mutate(status_clean = replace_na(
status_clean, "unknown"))5.1 Extracting Water Point Data
Now we are ready to extract the water point data according to their status.
The code chunk below is used to extract functional water point.
wp_functional <- wp_sf_nga %>%
filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))The code chunk below is used to extract nonfunctional water point.
wp_nonfunctional <- wp_sf_nga %>%
filter(status_clean %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional due to dry season",
"Non-Functional",
"Non functional due to dry season"))The code chunk below is used to extract water point with unknown status.
wp_unknown <- wp_sf_nga %>%
filter(status_clean == "unknown")5.2 Performing Point-in-Polygon Count
Next, we want to find out the number of total, functional, nonfunctional and unknown water points in each LGA. This is performed in the following code chunk.
First, it identifies the functional water points in each LGA by using st_intersects() of sf package. Next, length() is used to calculate the number of functional water points that fall inside each LGA.
NGA_wp <- NGA %>%
mutate(`total_wp` = lengths(
st_intersects(NGA, wp_sf_nga))) %>%
mutate(`wp_functional` = lengths(
st_intersects(NGA, wp_functional))) %>%
mutate(`wp_nonfunctional` = lengths(
st_intersects(NGA, wp_nonfunctional))) %>%
mutate(`wp_unknown` = lengths(
st_intersects(NGA, wp_unknown)))Notice that four new derived fields have been added into NGA_wp sf data.frame.
We can visualise the summary of NGA_wp sf dataframe in statistics forms such as mean, median, and max etc for both functional and nonfunctional by using summary() as shown in the code chunk below:
summary(NGA_wp) ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE
Length:30 Length:30 Length:30 Length:30
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
geometry total_wp wp_functional wp_nonfunctional
MULTIPOLYGON :30 Min. : 60.0 Min. : 14.00 Min. : 17.0
epsg:26392 : 0 1st Qu.:134.0 1st Qu.: 50.50 1st Qu.: 46.5
+proj=tmer...: 0 Median :166.5 Median : 75.00 Median : 58.0
Mean :182.9 Mean : 77.20 Mean : 65.9
3rd Qu.:212.0 3rd Qu.: 91.75 3rd Qu.: 86.5
Max. :443.0 Max. :249.00 Max. :140.0
wp_unknown
Min. : 4.00
1st Qu.:13.00
Median :22.00
Mean :24.47
3rd Qu.:32.75
Max. :78.00
5.3 Visualising attributes by using statistical graphs
ggplot(data = NGA_wp,
aes(x = total_wp)) +
geom_histogram(bins=20,
color="black",
fill="light blue") +
geom_vline(aes(xintercept=mean(
total_wp, na.rm=T)),
color="red",
linetype="dashed",
size=0.8) +
ggtitle("Distribution of total water points by LGA") +
xlab("No. of water points") +
ylab("No. of\nLGAs") +
theme(axis.title.y=element_text(angle = 0))
5.3.1 Observation from Statistical graph of NGA waterpoints
The histogram is a right-skewed distribution where the long tail extends to the right whole most values cluster on the left, as shown above in section 5.3.
There are a few possible outliers in the above histogram and we can use the 1.5 interquartile range (IQR) criterion to check whether they can be considered as outliers.
Q1 = 134 and Q3 = 212, which give an IQR = Q3-Q1 = 78Q1 - 1.5(IQR) = 134 - (1.5)(78) = 17Q3 + 1.5(IQR) = 212 + (1.5)(78) = 329
The 1.5(IQR) criterion tells us that any observation with water points that is below 17 or above 329 is considered a suspected outlier.

With that, we can conclude that there are three outliers in this histogram and those with an arrow above are the ones.
5.4 Saving the analytical data in rds format
write_rds(NGA_wp, "data/rds/NGA_wp.rds")6. Geospatial Mapping
6.1 Basic Choropleth Mapping
In this section, will be plotting different choropleth maps to analyse the distribution of water point in Nigeria, Osun state.
tmap_mode("plot")
qtm(NGA_wp,
fill = c("wp_functional","wp_nonfunctional", "wp_unknown"))
p1 <- tm_shape(NGA_wp) +
tm_fill("wp_functional",
n = 10,
style = "equal",
palette = "Blues") +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Distribution of functional water point by LGAs",
legend.outside = FALSE)p2 <- tm_shape(NGA_wp) +
tm_fill("total_wp",
n = 10,
style = "equal",
palette = "Blues") +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Distribution of total water point by LGAs",
legend.outside = FALSE)tmap_arrange(p2, p1, nrow = 1)
6.1.1 Observation of the Distribution of Water Points in Nigeria, Osun state.

From the map generated earlier in section 6.1 that shows the distribution of water points in Nigeria, Osun state according to their functionality. We observed a few things:
At first glance, non-functional water map has a higher intensity based on its colours spread compared to the other two maps on its right and left. That could means that non-functional water points are more widely spread in Nigeria, Osun state.
On the other hand, generally, functional water map has a least colours intensity spread compared to the other two maps. That could means that functional water points are less common/least spread in Nigeria, Osun state.
In conclusion, Nigeria, Osun state has a higher level of non-functional water than functional water.
6.2 Choropleth Map for Rates: Deriving Proportion of Functional Water Points and Non-Functional Water Points
We will tabulate the proportion of functional water points and the proportion of non-functional water points in each LGA. In the following code chunk, mutate() from dplyr package is used to derive two fields, namely pct_functional and pct_nonfunctional.
NGA_wp <- NGA_wp %>%
mutate(pct_functional = wp_functional/total_wp) %>%
mutate(pct_nonfunctional = wp_nonfunctional/total_wp)tm_shape(NGA_wp) +
tm_fill("wp_functional",
n = 10,
style = "equal",
palette = "Blues",
title = "Dependency ratio",
legend.hist = TRUE) +
tm_borders(lwd = 0.1,
alpha = 1) +
tm_layout(main.title = "Rate map of functional water point by LGAs",
legend.outside = TRUE)
6.3 Extreme Value Maps: Percentile Map
Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (Anselin 1994).
The percentile map is a special type of quantile map with six specific categories: 0-1%,1-10%, 10-50%,50-90%,90-99%, and 99-100%. The corresponding breakpoints can be derived by means of the base R quantile command, passing an explicit vector of cumulative probabilities as c(0,.01,.1,.5,.9,.99,1). Note that the begin and endpoint need to be included.
Step 1: Exclude records with NA by using the code chunk below
NGA_wp <- NGA_wp %>%
drop_na()Step 2: Creating customised classification and extracting values
percent <- c(0,.01,.1,.5,.9,.99,1)
var <- NGA_wp["pct_functional"] %>%
st_set_geometry(NULL)
quantile(var[,1], percent) 0% 1% 10% 50% 90% 99% 100%
0.2179487 0.2224103 0.2900820 0.4145480 0.5076962 0.6203446 0.6441441
Step 3: Creating the get.var function
We will write an R function as shown below to extract a variable (i.e. wp_nonfunctional) as a vector out of an sf data.frame.
arguments:
vname: variable name (as character, in quotes)
df: name of sf data frame
returns:
- v: vector with values (without a column name)
get.var <- function(vname,df) {
v <- df[vname] %>%
st_set_geometry(NULL)
v <- unname(v[,1])
return(v)
}Step 4: we will write a percentile mapping function by using the code chunk below.
percentmap <- function(vnam, df, legtitle=NA, mtitle="Percentile Map"){
percent <- c(0,.01,.1,.5,.9,.99,1)
var <- get.var(vnam, df)
bperc <- quantile(var, percent)
tm_shape(df) +
tm_polygons() +
tm_shape(df) +
tm_fill(vnam,
title=legtitle,
breaks=bperc,
palette="Blues",
labels=c("< 1%", "1% - 10%", "10% - 50%", "50% - 90%", "90% - 99%", "> 99%")) +
tm_borders() +
tm_layout(main.title = mtitle,
title.position = c("right","bottom"))
}Step 5: Test drive the percentile mapping function
percentmap("total_wp", NGA_wp)
7. First-Order Spatial Point Patterns Analysis Methods
Visualising the sf layers
It is always a good practice to plot the output sf layers on OSM layer to ensure that they have been imported properly and been projected on an appropriate projection system.
tmap_mode("view")
tm_shape(wp_functional) +
tm_dots(alph = 0.5,
size=0.01,
border.col = "blue",
border.lwd = 0.5) +
tm_shape(wp_nonfunctional) +
tm_dots(alph = 0.5,
size=0.01,
border.col = "yellow",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(8,12))7.1 Converting SF Data Frames to SP’s Spatial Class
The code chunk below uses as_Spatial() of sf package to convert the geospatial data from simple data feature data frame to sp’s Spatial* class.
nga_sp <- as_Spatial(wp_sf_nga)
nga_spclass : SpatialPointsDataFrame
features : 5745
extent : 177285.9, 291287.1, 340054.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Abandoned/Decommissioned
max values : unknown
nga_functional_sp <- as_Spatial(wp_functional)
nga_functional_spclass : SpatialPointsDataFrame
features : 2414
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Functional
max values : Functional but not in use
nga_nonfunctional_sp <- as_Spatial(wp_nonfunctional)
nga_nonfunctional_spclass : SpatialPointsDataFrame
features : 2101
extent : 180539, 290546.5, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 1
names : status_clean
min values : Abandoned/Decommissioned
max values : Non-Functional
Notice from the output message that the geospatial data wp_sf_nga, wp_functional, and wp_nonfunctional have all been converted to sp’s spatial* class now.
7.2 Converting the Spatial* Class into Generic SP Format
Spstat requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object. We need to convert the Spatial classes* into Spatial object first.
The code chunk below converts the Spatial* class of NGA into generic sp object.
nga_sp <- as(nga_sp, "SpatialPoints")
nga_spclass : SpatialPoints
features : 5745
extent : 177285.9, 291287.1, 340054.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
nga_functional_sp <- as(nga_functional_sp, "SpatialPoints")
nga_functional_spclass : SpatialPoints
features : 2414
extent : 177285.9, 290751, 343128.1, 450859.7 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
nga_nonfunctional_sp <- as(nga_nonfunctional_sp, "SpatialPoints")
nga_nonfunctional_spclass : SpatialPoints
features : 2101
extent : 180539, 290546.5, 340054.1, 450780.1 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
7.3 Converting the Generic SP Format into Spatstat’s ppp Format
Now, we will use as.ppp() function of spatstat to convert the spatial data into spatstat’s ppp object format.
nga_ppp <- as(nga_sp, "ppp")
nga_pppPlanar point pattern: 5745 points
window: rectangle = [177285.9, 291287.05] x [340054.1, 450859.7] units
nga_functional_ppp <- as(nga_functional_sp, "ppp")
nga_functional_pppPlanar point pattern: 2414 points
window: rectangle = [177285.9, 290750.96] x [343128.1, 450859.7] units
nga_nonfunctional_ppp <- as(nga_nonfunctional_sp, "ppp")
nga_nonfunctional_pppPlanar point pattern: 2101 points
window: rectangle = [180538.96, 290546.54] x [340054.1, 450780.1] units
Let us then plot nga_ppp and examine the different.
plot(nga_ppp)
plot(nga_functional_ppp)
plot(nga_nonfunctional_ppp)
We can also look at the summary statistics of the newly created ppp object by using the code chunk below.
summary(nga_ppp)Planar point pattern: 5745 points
Average intensity 4.547987e-07 points per square unit
Coordinates are given to 2 decimal places
i.e. rounded to the nearest multiple of 0.01 units
Window: rectangle = [177285.9, 291287.05] x [340054.1, 450859.7] units
(114000 x 110800 units)
Window area = 1.2632e+10 square units
There are no warning message about duplicates but let’s do a double check before moving on :)
any(duplicated(nga_ppp))[1] FALSE
From the generated output, we can confidently say that there is no duplication!
7.4 Creating owin object
When analysing spatial point patterns, it is a good practice to confine the analysis with a geographical area like Nigeria boundary. In spatstat, an object called owin is specially designed to represent this polygonal region.
The code chunk below is used to convert nga SpatialPolygon object into owin object of spatstat.
ONLY NIGERIA - OSUN STATE**
library(sf)
library(spatstat)
# Load the spatial point data
nga_spp <- st_as_sf(NGA, coords = c("x", "y"))
# Convert the "SpatialPoints" object to a polygon format using st_cast()
nga_poly <- st_cast(nga_spp, "POLYGON")
# Extract the polygon geometry component of the "sf" object
nga_geom <- st_geometry(nga_poly)
# Convert the polygon geometry component to a list of polygon objects
nga_list_poly <- as.list(nga_geom)
# Create an "owin" object from the list of polyggon objects
nga_owin <- as.owin(nga_spp)
# Analyze the point pattern using spatstat functions
plot(nga_owin)
The code chunk below is used to convert nga_geometry spatialpolygon object into owin object of spatstat.
ONLY NIGERIA**
library(sf)
library(spatstat)
# Load the spatial point data
geometry_spp <- st_as_sf(geoNGA, coords = c("x", "y"))
# Convert the "SpatialPoints" object to a polygon format using st_cast()
nga_poly <- st_cast(geometry_spp, "POLYGON")
# Extract the polygon geometry component of the "sf" object
geom <- st_geometry(nga_poly)
# Convert the polygon geometry component to a list of polygon objects
nga_list_poly <- as.list(geom)
#converting the spatial point into a projected coordinate system using the st_transform from the 'sf' package.
nga_poly_proj <- st_transform(nga_poly, crs = "+proj=utm +zone=30 +ellps=WGS84")
# Create an "owin" object from the list of polyggon objects
nga_geometry_owin <- as.owin(nga_poly_proj)
# Analyze the point pattern using spatstat functions
plot(nga_geometry_owin)
nga_ppp = nga_ppp[nga_owin]
plot(nga_ppp)
7.5 Kernel Density Estimation
In this section, we will learn how to compute the kernel density estimation (KDE). Some definitions:
Density: The amount of features or events per unit area
Density estimation: The construction of the density function from the observed data
Kernel: A window function fitted on each observation (weighted or unweighted) to determine the fraction of the observation used for density estimation at any location within the window
The code chunk below computes a kernel density by using the following configurations of density() of spatstat:
bw.diggle() automatic bandwidth selection method. Other recommended methods are bw.CvL(), bw.scott() or bw.ppl().
The smoothing kernel used is gaussian, which is the default. Other smoothing methods are: “epanechnikov”, “quartic” or “disc”.
The intensity estimate is corrected for edge effect bias by using method described by Jones (1993) and Diggle (2010, equation 18.9). The default is FALSE.
kde_nga_bw <- density(nga_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_nga_functional_bw <- density(nga_functional_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_nga_nonfunctional_bw <- density(nga_nonfunctional_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")The plot() function of Base R is then used to display the kernel density derived.
plot(kde_nga_bw)
plot(kde_nga_functional_bw)
plot(kde_nga_nonfunctional_bw)
bw <- bw.diggle(nga_ppp)
bw sigma
175.2226
bw_functional <-bw.diggle(nga_functional_ppp)
bw_functional sigma
263.5313
bw_nonfunctional <-bw.diggle(nga_nonfunctional_ppp)
bw_nonfunctional sigma
296.0087
nga_ppp.km <- rescale(nga_ppp, 100, "km")kde_nga.bw <- density(nga_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
kde_nga_functional.bw <- density(nga_functional_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
kde_nga_nonfunctional.bw <- density(nga_nonfunctional_ppp, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_nga.bw)
plot(kde_nga_functional.bw)
plot(kde_nga_nonfunctional.bw)
gridded_kde_nga_bw <- as.SpatialGridDataFrame.im(kde_nga.bw)
spplot(gridded_kde_nga_bw)
kde_nga.bw_rastor<- raster(gridded_kde_nga_bw)
kde_nga.bw_rastorclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 8.948485, 9.616045 (x, y)
extent : 1765.032, 2910.438, 3314.347, 4545.201 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -6.500989e-17, 0.4183645 (min, max)
projection(kde_nga.bw_rastor) <- CRS("+init=EPSG:3414")
kde_nga.bw_rastorclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 8.948485, 9.616045 (x, y)
extent : 1765.032, 2910.438, 3314.347, 4545.201 (xmin, xmax, ymin, ymax)
crs : +init=EPSG:3414
source : memory
names : v
values : -6.500989e-17, 0.4183645 (min, max)
tm_shape(kde_nga.bw_rastor) +
tm_raster("v") +
tm_layout(legend.position = c("right", "bottom"), frame = FALSE)7.5.1 Observations from Kernel Density Map
- Describe the spatial patterns revealed by the kernel density map
From the density map generated, we could tell that the distribution of water points are mostly distributed evenly across Nigeria, Osun state (between 0.10 to 0.15) with only a few spots that have higher intense colours (between 0.25 to 0.30).
There are still many areas in Nigeria, Osun state that have zero distribution of water points which means they do not have access to any water (0).
- Highlight the advantages of kernel density map over point map
Provides a continuous surface that represents the estimated density of points at each location. This allows to see the overall distribution of water points acorss Nigeria, Osun state more precisely with high or low densities. In contrast, a point map simply displays individual points, which can make it difficult to see patterns in the data.
Help identify hot spots or areas of high concentration, which can be useful for identifying clusters of points or areas of interest. This can be particularly useful in this assignment to identify areas with high intensity of water points in Nigeria, Osun state.
More visually appealing and easier to interpret than a point map, especially when dealing with large numbers of points.
In conclusion, a kernel density map can provide a more accurate representation of the spatial distribution of points and can be more useful for identifying patterns and hot spots in the data. However, a point map can still be useful for exploring the individual points and for understanding the distribution at the local scale. The choice between a kernel density map and a point map often depends on the specific needs of the analysis and the type of data being analyzed.
8. Second Spatial Point Patterns Analysis Methods
To confirm the observed spatial patterns above, a hypothesis test will be conducted. The hypothesis are test as follows:
Ho = The distribution of water points in Nigeria, Osun state are randomly distributed
H1 = The distribution of water points in Nigeria, Osun sate are not randomly distributed
Confidence level = 95%
8.1 Analysing Spatial Point Process Using G-Function
The code chunk below is used to compute G-function using Gest() of spatat package.
G_nga_functional = Gest(nga_functional_ppp, correction = "border")
plot(G_nga_functional, xlim=c(0,1900))
G_nga_nonfunctional = Gest(nga_nonfunctional_ppp, correction = "border")
plot(G_nga_nonfunctional, xlim=c(0,1900))
8.1.1 Performing Complete Spatial Randomness Test
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.001.
Monte Carlo test with G-function
G_nga_functional.csr <- envelope(nga_functional_ppp, Gest, nsim = 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.
Done.
G_nga_nonfunctional.csr <- envelope(nga_nonfunctional_ppp, Gest, nsim = 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60........
.70.........80.........90.........100.........110.........120.........130......
...140.........150.........160.........170.........180.........190.........200....
.....210.........220.........230.........240.........250.........260.........270..
.......280.........290.........300.........310.........320.........330.........340
.........350.........360.........370.........380.........390.........400........
.410.........420.........430.........440.........450.........460.........470......
...480.........490.........500.........510.........520.........530.........540....
.....550.........560.........570.........580.........590.........600.........610..
.......620.........630.........640.........650.........660.........670.........680
.........690.........700.........710.........720.........730.........740........
.750.........760.........770.........780.........790.........800.........810......
...820.........830.........840.........850.........860.........870.........880....
.....890.........900.........910.........920.........930.........940.........950..
.......960.........970.........980.........990........ 999.
Done.
plot(G_nga_functional.csr,xlim=c(0,1900))
plot(G_nga_nonfunctional.csr,xlim=c(0,1900))
8.1.2 Analysis of Spatial Point Pattern using G-Function
The G-function creates a graph of the “edge-corrected G-function”. The upper curve is the empirical distribution of nearest neighbor distances for the water points, adjusted for edge effects caused by a finite domain.
The lower curve shows the expected distribution for random uniform data of the same size on the same domain. The light-blue band is a 95% confidence envelope, which gives you a feeling for the variation due to random sampling.
The G line is above the envelope and spatial randomness line for both functional and non-functional G graphs therefore, the graphs indicate that there is no structure in the water point data and there is a complete spatial randomness.
In summary, I will accept my null hypothesis and reject the alternative hypothesis to conclude that the distribution of water points in Nigeria, Osun state are randomly distributed.
G-function is useful for analyzing properties of spatial point patterns. This article compared tree data to a single instance of random uniform data. By using the SPP procedure, you can run a more complete analysis and obtain graphs and related statistics with minimal effort.
8.2 Analysing Spatial Point Process Using L-Function
According to lesson 4 slides,
L(r) >0 indicates that the observed distribution is geographically concentrated
L(r) <0 implies dispersion
L(r) = 0 indicates complete spatial randomness(CRS)
L_nga_functional = Lest(nga_functional_ppp, correction= "Ripley")
plot(L_nga_functional, . -r ~ r,
ylab= "L(d)-r", xlab = "d(m)")
L_nga_nonfunctional = Lest(nga_nonfunctional_ppp, correction= "Ripley")
plot(L_nga_nonfunctional, . -r ~ r,
ylab= "L(d)-r", xlab = "d(m)")
L_nga_nonfunctional.csr <- envelope(nga_nonfunctional_ppp, Lest, nsim = 39, rank = 1, glocal=TRUE)Generating 39 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
Done.
L_nga_functional.csr <- envelope(nga_functional_ppp, Lest, nsim = 39, rank = 1, glocal=TRUE)Generating 39 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39.
Done.
plot(L_nga_functional.csr, . - r ~ r, xlab="d", ylab="L(d)-r")
plot(L_nga_nonfunctional.csr, . - r ~ r, xlab="d", ylab="L(d)-r")
8.2.1 Analysis of Spatial Point using L-Function
For distance more than 1000m, the L(d) - r function (black line) lies above the L(d) - r function at CRS (red line).
Therefore, I will accept my null hypothesis and reject the alternative hypothesis to conclude that the distribution of water points in Nigeria, Osun state are randomly distributed.
For both functions (G&L), I will accept null hypothese and reject the alternative hypothesis.
9. Spatial Correlation Analysis
What is Local Colocation Quotients (LCLQ):
A point event category A is colocated with point events of category B if it is surrounded by several point event category B within a specified distance.
E.g.

In this assignment, we are interested to find out if the spatial distribution of functional and non-functional water points are independent from each other.
To confirm the observed spatial correlation pattern, a hypothesis test will be conducted. The hypothesis are test as follows:
Ho = The distribution of functional and non-functional water points are independent from each other
H1 = The distribution of functional and non-functional water points are not independent from each other
Confidence level = 95%
wp_sf_clean <- wp_sf_nga %>% filter(!status_clean=='unknown')
nb = include_self(st_knn(st_geometry(wp_sf_clean), 6))
wt = st_kernel_weights(nb, wp_sf_clean, "gaussian", adaptive = TRUE)
f = wp_sf_clean %>%
filter(status_clean == "Functional")
A = f$status_clean
nf = wp_sf_clean %>%
filter(status_clean == "Non-Functional")
B = nf$status_clean
LCLQ = local_colocation(A, B, nb, wt, 49)
LCLQ_wp = cbind(wp_sf_clean, LCLQ)Code breakdown:
The code first filters the
wp_sf_cleanobject to remove any entries with a “status_clean” of “unknown”.Then, it calculates the 6 nearest neighbors for each feature in the
wp_sf_cleanobject using thest_knnfunction from thesflibrary and creates a weight matrix for each feature with thest_kernel_weightsfunction, using a Gaussian kernel.Next, the code creates two separate objects,
fandnf, which contain only the “Functional” and “Non-Functional” features from thewp_sf_cleanobject, respectively.The
local_colocationfunction is then applied to the two objects, using the nearest neighbor information and weight matrix calculated earlier, with a neighborhood size of 49. Finally, the output of thelocal_colocationfunction is combined with thewp_sf_cleanobject usingcbind, creating a new objectLCLQ_wp.
tmap_mode("view")
tm_shape(NGA_wp) +
tm_polygons() +
tm_shape(LCLQ_wp) +
tm_dots(col = c("Non.Functional"),
size = 0.01,
border.col = "black",
border.lwd = 0.5) +
tm_view(set.bounds = c(4,7,5,8),
set.zoom.limits = c(8, 13))9.1 Analysis of Spatial Correlation
The tool will determine, for each feature of the Category of Interest, whether the features of the Neighboring Category are more or less present in its neighborhood compared to the overall spatial distribution of the categories. For example, for each feature of category A, a resulting local colocation quotient (LCLQ) value of 1 means that you are as likely to have category B as a neighbor as you might expect. A LCLQ value greater than 1 means you are more likely (than random) to have B as a neighbor, and a LCLQ value less than 1 means that the feature of category A is less likely to have a category B point as your neighbor (than a random distribution).
According to the map generated, the proportion of A (functional points) within the neighborhood of B (nonfunctional points) is higher than the global porportion of A, the colocation quotient is therefore high or based on the slides explanation, 99% of the point have a value of 1 which means that it is likely to have both functional and nonfunctional points collocate together.
Therefore, I will reject the null hypothesis and accept the alternative hypothesis saying that the distribution of functional and non-functional water points are not independent from each other.
10. References & Resources Used
Here are the list of resources used in this analysis, as well as their links. Special thanks to Seniors work samples and Prof Kam for all the detailed explanations and clear documentary posted!! :))
Section 5.3.1: https://bolt.mph.ufl.edu/6050-6052/unit-1/one-quantitative-variable-introduction/understanding-outliers/
Section 7.5: https://gistbok.ucgis.org/bok-topics/kernels-and-density-estimation
Section 8.1: https://blogs.sas.com/content/iml/2016/09/19/nearest-neighbor-distances.html
Section 8.2.1: https://www.mattpeeples.net/modules/PointPattern.html
Section 9.1: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/colocationanalysis.htm